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Abstract 

Few of food web theory hypotheses/predictions can be readily 
— , tested using empirical data. An exception is represented by simple 

CN probabilistic models for food web structure, for which the likelihood 

^ has been derived. Here I test the performance of a more complex 

model for food web structure that is grounded in the allometric scal- 
ing of interactions with body size and the theory of optimal foraging 
ON (Allometric Diet Breadth Model - ADBM). This deterministic model 

has been evaluated measuring the fraction of trophic relations cor- 
rectly predicted. I contrast this value with that produced by simpler 
^ models based on body sizes and find that the data does not favor the 

^ more complex model: the information on allometric scaling and opti- 

mal foraging does not significantly increase the fit to the data. Also, 
I take a different approach and compute the p-value for the fraction 
of trophic interactions correctly predicted by ADBM with respect to 
three probabilistic null models. I find that the ADBM is clearly bet- 
ter at predicting links than random graphs, but other models can do 
even better. Although optimal foraging and allometric scaling could 
improve our understanding of food webs, the models need to be ame- 
liorated to find support in the data. 
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Introduction 



Understanding the main forces shaping the topology of food webs (networks 
depicting who eats whom in an ecosystem) is a central problem in ecology 
that has received a lot of attention ( Cohen et al.|1990 Williams and Martinez 



2000, Cattin et al. 2004, Allesinaet al. 2008 , Allesina and Pascual 2009). This 



problem has been typically investigated using simple probabilistic models, 
but recently models that incorporate explicitly relevant biological quantities 



in their assumptions have started appearing in the literature (Loeuille and 



Loreau 2005, Rossberg et al. 2006, Petchey et al. 2008). 



In a work that investigated the role of body size and optimal foraging 



theory in shaping food web structure, Petchey et al. (2008) assessed the 
goodness of variants of their main model, measuring the proportion of em- 
pirical connections a model is able to predict. If a model proposes K connec- 
tions among species of which M are present in the empirical data set, then 
the proportion of correct links (overlap) is Q = M/K. They measured this 
overlap for their Allometric Diet Breadth Model (ADBM), and they showed 
that the best version of the ADBM is able to correctly predict, depending 
on the empirical network examined, between 5% and 65% of the proposed 
links ( |Petchey et aT]|2~0~0~8~| . The ADBM is based on two main ideas: optimal 
foraging theory and allometric scaling of relevant quantities with body size 



dBeckerman et al.||2006[ |Petchey et al.||2008[ ). The ADBM is different from 
most previous models also because it is not probabilistic: given an empirical 
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network, body sizes for all the species and number of links in the network, 4 
further parameters dealing with the foraging are optimized numerically and 
a single network is produced deterministically. Here I compare the ADBM 
with simpler deterministic models that include information on body size but 
do not make use of allometric scaling and optimal foraging, and I find that 
the data does not support the use of the more complex model. 

Also, I derive a p-value for the Q produced by the ADBM using as a 



reference a random digraph (Erdos and Renyi 1960), a variation of the cas- 
cade model ( Cohen et al.|[l9"90 ) and a recently proposed group-based random 



digraph (Allesina and Pascual 2009). The derivation of the probability mass 
function for these simple models is a step forward in the analysis of more 
complex models for food web structure, for which the derivation of a like- 
lihood can be almost impossible. The derivation presented here can help 
associating statistical significance to the results of highly complex models, 
such as those based on evolving networks or systems of differential equations 



(Caldarelli et al. 1998). 



Results show that the ADBM performs significantly better than the ran- 
dom graph in terms of overlap. It also performs better than the cascade 
model analyzed here in most of the cases. The performance is significantly 
worse than that of the group-based random digraph. 

In summary, even though allometric scaling and optimal foraging have 
the potential to illuminate the topology of food webs, the present models do 
not provide enough evidence to support this claim. 
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Methods 

The Allometric Diet Breadth Model 

Here I briefly describe the ADBM in its "Ratio" incarnation, that is the 
one that produces the best fit to the empirical data. A more detailed de- 
scription of the model and its variations can be found in the original articles 



(Beckerman et al. 2006, Petchey et al. 2008) 



The model takes as an input a vector B describes the species body sizes. 
The model requires the number of links (L) for the empirical food web one 
wants to replicate that will be used in the numerical optimization routine. 
Then, the model uses four other parameters and b that determine 

the foraging behavior of the species in the food web. The model consists of 
two steps: a) compute, for each predator, the profitability of each possible 
prey; b) compute a diet breadth (i.e. number of prey) for each predator: this 
number is chosen to maximize the rate of energy intake. Repeating the two 
steps for all the consumers produces a food web. 

Here is a detailed description of the two steps outlined above: 

1. Profitability. The profitability Py of prey i for consumer j is defined 
as: 

where Pj is the body size of species i and b is a positive parameter. 

2. Diet Breadth. A predator j will prey upon z species, where z is the 
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value in 0, 1, . . . , S that maximizes the function: 



/(*,;) = E - Vo (2) 



where the permutation a is the permutation that orders the prey ac- 
cording to decreasing profitability: if z = 1 is the value that maximizes 
f(z,j), then consumer j will choose only the most profitable prey, if 
z = 2 then it will choose the two most profitable prey and so on. This 
apparently complicated function is easily justifiable in terms of optimal 
foraging. The three parameters a, a\ and a 2 are needed for computing 
the attack rate, and the parameter b is involved in the computation of 
the handling time. 

Repeating the two steps for all consumers generates a food web that will be 
compared with the empirical data. The performance of the model is measured 
as the fraction of links that correctly match the ones in the empirical food 
web. If an instance of the ADBM for a given network produces K links of 
which M are present in the empirical food web, then the proportion of links 
correctly predicted, or overlap is Q = M/K. The parameters a, a±, a 2 and 
b are optimized numerically so that the model a) correctly predicts the total 
number of links in the network (K m L) and b) Q is maximized. 

Running the ADBM for the 9 published food webs examined here yields 
n E [0.08,0.65] (Table 1). 
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Four simple models based on body size 

One of the characteristics of the ADBM is that it produces interval networks: 
when species are ordered according to body size all the prey of a given preda- 



tor are adjacent. Food webs are known to be quasi-interval (Williams and 



Martinez 2000 Cattin et al. 2004, Stouffer et al. 2006 Allesina et al. 2008), 



and this could be a main driver of the performance of the ADBM. It makes 
sense therefore to compare its performance with that of models that retain 
the intervality but do not contain extra information regarding optimal for- 
aging and allometric scaling. Of all possible models, I analyze here four that 
have the virtue of being very simple and sharing the same structure. For 
each possible predator-prey couple, one computes a value that depends on 
body sizes of predator and prey: = f(Bi,Bj). If a < < b, where a 
and b are food web-dependent parameter estimates, one draws a connection. 
If Zij is not included in the interval (a, b], no connection is drawn. In what 
follows, I analyze four different f(Bi,Bj): 

1. "Diff": f(Bi,Bj) = Bi — Bj. The difference between predator (i) and 
prey (J) sizes must fall in (a, b] to draw a connection. 

2. "Ratio": /(-Bj, Bj) = B^/ Bj. The ratio between the body sizes is what 
drives the structure of the food web. 

3. "LogRatio": f{B u Bj) = ln{B { + l)/ln(Bj + 1). Where 1 is added so 
that the function is positive for all possible body sizes. 
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4. "DiffRatio": /(-Bj, Bj) = (Bj - Bi)(Bi/Bj). This model combines 
the first two models. Note that the function is very similar to Eq. 1 
(profitability). 

All the four models produce interval networks, are deterministic in nature 
(as the ADBM), and require the optimization of two parameters (a, b) that 
can be easily accomplished by trying all relevant combinations. 

For each model/food web, I optimize a, b so that a) the number of links 
produced is similar to the one measured empirically: if the ADBM proposes 
K links and \K — L\ = t, I accept as possible solutions only those whose 
number of connections is in [L — t,L+t] (this is to ensure that the comparison 
is fair), b) Among all solutions satisfying the previous requirement, I choose 
the one that maximizes Q. Contrasting these values with those produced by 
the ADBM can help us determine whether optimal foraging and allometric 
scaling do play a crucial role in predicting the links in the food web. 

p-value: a random digraph 

Another way to assess the goodness of a given Q is to associate a p-value 
to it. This quantity expresses the probability of obtaining a result that is 
equally good or better using a null model. In the remainder of the section I 
derive analytically such a p-value when the null model is a random digraph, 
while in the Appendix I derive the p-value when the null model is a cascade 
model or a group-based random digraph. I chose these models because they 
share the same derivation, and are in a continuum of complexity that makes 
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the comparison easier. 



A random digraph (Erdos and Renyi 1960) is the simplest possible way 



to produce networks: it takes just two parameters (S, the number of nodes 
in the network - standing for species, and p, the probability that two species 
are connected by feeding relations) and produces a network connecting any 
two species with a directed link with probability p. We want to know the 
probability V(M, K\S,p, N(S, L)) that a random graph using parameters S 
and p produces a network with K links, of which M are matching those of 
an empirical network N that contains S species and L links. We can start 
by writing the probability that the random graph produces exactly K links. 
This is a binomial probability mass function (pmf): 

V(K\S,p,N(S,L)) = (f)p*(l -P) (S2 - K) (3) 

If we set p = Lj S 2 we maximize the probability of obtaining L links in 
the generated network (this is also the maximum likelihood estimate for the 
parameter). Once we know that the graph has produced K links, we can 
compute the probability that of these M are matching those of the empirical 
network N(S, L) using a hypergeometric distribution: 



V(M\S,p,N(S,L),K) = U( *- Mj (4) 
The joint bivariate pmf becomes: 
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V(M, K\S, P , N(S, L)) = p K (l - pf 2 - K ^ (5) 

This pmf assumes values for K 6 [0, . . . , S 2 ] and, for each K, M 6 
[0, . . . , min(K, L)]. We can therefore describe the pmf in a table with (S 2 + 
1)(L + 1) — L(L + l)/2 values associated with all the possible combinations 
of K and M. An example of such a table is reported in Figure 1 for a small 
network. The table expressing the bivariate probability mass function shows 
the probability of obtaining any combination of K and M. Because we are 
interested in the pmf for Q = Mj K we can map the results from the bivariate 
pmf into a univariate distribution by summing the probabilities for all the 
combinations of K and M leading to the same Q. For example, in Figure 
1 I report the first few rows of such a table. From this, one can draw the 
complete pmf for Q. 

Deriving the probability of reproducing exactly the data shows the rela- 
tion between Q and the likelihood. In fact, the likelihood can be seen as the 
probability of having Q = 1 when M = K = L. By substituting in Eq. [5] we 
obtain: 



V(L, L\S,p, N(S, L)) = P L (1- p) {s2 - L) = CiS^plNiS, L)) 



(6) 



We can readily write also the expression for the AIC, whose values will 
be used in the Discussion. The number of parameters of the model is 9 = 2. 



The Akaike's Information Criterion (Akaike 1974) becomes 
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AIC = 26- 2logC = 4 - 2(L log{p) + (S 2 - L) logil - p)) (7) 



Results 



I replicated the results obtained by Petchey et al. (Petchey et al. 2008) for 



nine food webs: Benguela Pelagic ( Yodzis||1998 ), Broadstone Stream (Wood- 



ward and Hildrew 2001), Scotch Broom (Memmott et al. 2000), Carpinteria 



Salt Marsh flLafferty et aL|2006fr , Coachella Valley ( jPoiis|p9T| ) , Sierra Lakes 



(Harper-Smith et al. 2005), Skipwith Pond (Warren 1989), Tuesday Lake 



(Jonsson et al. 2005) and Ythan Estuary (Hall and Raffaelli 1991). The 



optimized parameters for the ADBM were taken from the original article 



(Petchey et al. 2008) so that for the produced network the number of con- 



nections match that of the corresponding empirical network and the overlap 
is maximized. I then analyzed the same networks using the four simple mod- 
els based on body sizes presented above. I am reporting in Table 1 all the 
overlap values. In three cases the ADBM is the best performing model (in- 
cluding 2 ties). In the other cases one or more models have higher Q than 
the ADBM. Each of the four models produces the highest Q in three cases 
(including ties). For the "Broom" system all the four models have better 
overlap than the ADBM. The "Diff" model shows higher or equal Q values 
for 5 networks. The "LogRatio" in 4 cases. The other two models yield 
higher or equal values in 3 cases. 

In all cases the results are quite similar to those produced by the ADBM, 



10 



as confirmed when the exact location of predicted and non-predicted links is 
examined (Figures 2 and 3): the models tend to correctly predict the same 
links and fail in the same regions of the matrix. The similarity with the 
ADBM is particularly pronounced for the "Ratio", and "LogRatio", while 
the "Diff " model tends to select a different set of links compared to the other 
models. In no case any of the models predicted exactly the same links. 

Note however that the four simpler models optimize 2 parameters, while 
the ADBM requires 4 parameters. The ADBM is therefore more flexible and 
this should lead to better performance. How can we then fairly compare the 
models? If these were probabilistic models, then we could use for example 
AIC (or BIC, or any other selection criteria) to balance model performance 
and complexity. No simple solution however exists for deterministic models. 
One possibility is therefore to make the models probabilistic. This can be 
done in a straightforward way. Every time a deterministic model would draw 
a link, we can instead draw it with probability q 1 . If the deterministic model 
does not predict a link, we can still draw it in the probabilistic counterpart 
with probability q 2 . Deriving the likelihood for such a process is a simple 
extension of that of the models presented above, and we can see that the 
maximum likelihood estimates for q 1 and q 2 are £1 = Mj K and (L—M) / (S 2 — 
K) respectively. While this modification makes all the models general (i.e. 
they can produce any network), it also negatively affect the expected fl value. 
For a deterministic model X that proposes K links of which M are present 
in the empirical network, the expected fl for its probabilistic version X' is: 
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E 



Mqi + (L- M)q 2 _ M 2 S 2 + L 2 K - 2LMK 
Kq x + (S 2 - K)q 2 ~ LK(S 2 - K) 



For example, if the ADBM yields Q = 0.57143 for the Benguela food 

ADBM 



ADBM' 



web in the deterministic case, the probabilistic version yields E 
0.37843, a decrease of 1/3 in performance. Nevertheless, this allows a fair 
comparison among the models by means, for example, of AIC The values are 
reported in Table 2. When we account for model complexity, the probabilistic 
version of the ADBM never yields the best AIC, the "Diff" has the best 
value in 4 cases and the remaining 5 cases are split among the remaining 



models. The use of AIC allows also the use of "Akaike weights" (Burnham 



and Anderson 2002). These quantities provide a measure of strength of the 
evidence for each model. The results are reported in Table 2 and show that 
we can say with confidence that the ADBM is not the best among the models 
in all cases but three (Benguela, Skipwith and Tuesday, A.W. > 0.05). In no 
case we find strong evidence for the ADBM (AW 7 ! > 0.95). 

I also computed the probability of obtaining an Q that is greater or equal 
than that of the ADBM using the random graph, cascade model and group 
based model (Methods, Appendix). In all these cases, I chose parameters 
that a) made the expected number of links E[K] = L and b) minimized the 
AIC. Note that this optimization does not target the overlap directly. For the 
random graph the optimization is simply done by setting p = L/S 2 . For the 
cascade model, I searched using a genetic algorithm the best hierarchy that 
maximized the likelihood. The two parameters were set to pu = 2Ljj/(S(S — 
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1)) and pl = 2Ll/(S(S + 1)) to maximize the likelihood and obtain on 
average L links. The same type of search can be performed for the group- 
based random graph. Also here, I tried to find the configuration with the 
minimum AIC While in the cascade model the number of parameters is fixed 
(and therefore maximizing the likelihood minimizes AIC), in this model the 
number of parameters varies according to the number of groups 7. I therefore 



searched, following Allesina and Pascual (2009), for the balance between the 



number of parameters and goodness of fit using Akaike's AIC ( Akaike|[l974 ). 
The results in terms of likelihoods, number of parameters and AIC values 
are reported in Table 3. 

For each model, I computed the expected overlap with the data (-E[^]) 
and the probability that a model x produces an overlap value equal or greater 
than that of the ADBM (v (tt > )) (Table 1). I computed these 
quantities analytically for the random graph (RND) and cascade (CASC) 
models. Because listing all combinations for the group-based case (GROUP) 
is not computationally feasible, I constructed 10 5 networks for each data 
set using this model, and I measured the overlap in this set of generated 
networks. 

Discussion 

I contrasted the ADBM with four deterministic models that retain intervality 
(predators prey upon consecutive species) and information on body sizes, 
but do not include optimal foraging and allometric scaling. I found that 
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these models perform as well as or even better than the ADBM. This is 
true regardless the specific analysis performed (i.e. f2 values, AIC of the 
probabilistic counterpart of each model, Akaike weights, direct inspection 
of the predicted links). The results indicate that including allometry and 
optimal foraging, although biologically realistic, does not improve the fit 
to the data. This can be happening either because these features do not 
leave a strong signature in food web structure or because they have not been 
correctly included in the models. Also, the similarity among the results of 
the simpler models (especially "Ratio" and "LogRatio") and the ADBM is 
so strong that one may suspect that the results of the ADBM are totally 
driven by simpler mechanisms. In particular, intervality accounts for most 
of the successes and failures of these simple models in predicting links. Note 
however that possibly using body size is not the way of ordering the species 
that maximizes intervality: if we were to find the best species' trait that 
maximizes diet intervality, we could build models such as the ones illustrated 
above that would yield a better fit to the empirical data. 

By examining p— values I found that the ADBM performs, in terms of 
overlap, significantly better than the random digraph in all cases 



[V ( Q > Q ) << 0.05 ). With respect to the cascade model presented 

V \RND ADBM/ J 

in the Appendix, the ADBM performs significantly better in 7 cases, and 
yields non-significant results in two cases (Broom, V > 0.06 and Skipwith 

V > 0.45). The group-based model performs significantly better than the 
ADBM (V ~ 1.0 in all cases). These results are exactly reflected also in 
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the expected values for the overlap of the three models: the random graph 
on average presents much lower overlap than the ADBM (mean difference 
between models = —0.232), the cascade is better than the random (mean 
difference with the ADBM = —0.13) and the group-based model does much 
better than the ADBM (mean difference = 0.292). These results are hardly 
surprising, given that they mirror perfectly the complexity of the models: the 
random and cascade have less parameters than the ADBM, while the group- 
based has many more. A1C (or B1C, or other criteria) for probabilistic models 
can deal with the assessment of the goodness of fit of a model accounting 
for both its performance and its complexity: a model has to do much better 
in terms of performance to justify a greater number of parameters. A1C is 
well rooted in the information theory, being a measure of information loss 
when the model is used instead of the data. Of the three probabilistic models 
presented here, the group-based has better overlap, likelihood and AIC in all 
cases (Table 4). Note that the AIC for the probabilistic version of the ADBM 
presented above is worse than that of the random case in 5 cases, and worse 
than the cascade in all cases. This means that the straightforward way of 
making the model probabilistic greatly hampers its performance. Producing 
a better model grounded in optimal foraging theory that is probabilistic in 
nature is definitely possible, and should be pursued to test whether these 
mechanisms could contribute to our understanding of network structure. 

The results of this exercise also show that measuring overlaps without 
a quantitative comparison with other models is far from being satisfactory. 
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Accepting these numbers at face value without including the probability of 
obtaining them using simpler models or even at random can lead us to finding 
patterns and results that vanish once we scrutinize the models in detail. 
In order to test whether and how optimal foraging, allometric scaling or 
any other mechanism do influence food web structure, a rigorous statistical 
analysis such as the one presented here is required. Based on the data, one 
can conclude that in order to prove that optimal foraging and allometric 
scaling are important for food web structure, they need to be embedded in 
better models than the current ones. In the meantime, for lack of a better 
alternative we cannot reject the null hypothesis that these forces play no role 
in shaping food webs. 
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Figure 1: Building the exact probability mass distribution for the overlap 
of links using a random digraph. First, evaluate relevant parameters (left). 
Then, build a table for all the possible combinations of K and M (center, 
just 10 of the 87 rows presented). Finally, condense the table according to 
Q, creating a univariate pmf (right). 

Appendix 

p-value: a cascade model 

Here I repeat the analysis above for a version of the cascade model. The 
cascade model was the first probabilistic model for food web structure to be 

. I examine here a simple variation on the orig- 
inal model. To produce a network, a vector H representing a hierarchy (an 
order) of the species is required. If we order the empirical network according 
to H, we can divide the links in the network into two classes: a) connections 
from lower ranked species to higher ranked species (forward connections) 
and b) connections from higher to lower or equal ranked species (backward 
connections). In the adjacency matrix associated with the ordered network, 
the forward connections are contained in the upper triangular part of the 
matrix, while the backward connections lie either on the lower triangular 



proposed (Cohen et al.||1990 
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Figure 2: Benguela Pelagic food web. For each model, I report the links 
correctly predicted (black), those incorrectly predicted (red) and those not 
predicted by the model but present in the empirical web (blue). 
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Figure 3: Ythan Estuary food web. For each model, I report the links 
correctly predicted (black), those incorrectly predicted (red) and those not 
predicted by the model but present in the empirical web (blue). 
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Food Web 


S 


L 


n 


Q 


Q 


Q 


n 








ADBM 


Diff 


Ratio 


LogRatio 


Diff Ratio 


Benguela 


29 


191 


0.57143 


0.48705 


0.56771 


0.557895 


0.54497 


Broadstone 


29 


156 


0.40385 


0.42308 


0.38461 


0.384615 


0.40385 


Broom 


68 


101 


0.07767 


0.1 


0.13592 


0.137255 


0.09804 


Carpinteria 


72 


238 


0.16456 


0.21429 


0.16318 


0.172996 


0.15900 


Coachella 


26 


228 


0.65065 


0.52863 


0.63877 


0.656388 


0.57205 


Sierra 


33 


175 


0.60366 


0.61047 


0.50610 


0.487805 


0.55758 


Skipwith 


71 


347 


0.13833 


0.12680 


0.13256 


0.132565 


0.13833 


Tuesday 


73 


410 


0.46472 


0.40146 


0.46472 


0.462287 


0.43796 


Ythan 


88 


425 


0.18824 


0.21177 


0.20235 


0.202353 


0.17412 



Table 1: Overlap values for the ADBM and the four simpler models based 
on body size described in the text. 



Food Web 


AIC 


AIC 


AIC 


AIC 


AIC 


AW. 




ADBM' 


Diff 


Ratio' 


LogRatio' 


Diff Ratio' 


ADBM' 


Benguela 


825.25 


880.56 


821.08 


831.32 


842.5 


1.100E-01 


Broadstone 


824.63 


811.5 


829.04 


829.04 


820.63 


1.400E-03 


Broom 


1110.72 


1100.12 


1085.6 


1085.34 


1100.45 


1.640E-06 


Carpinteria 


2036.7 


1991.09 


2033.25 


2026.37 


2036.29 


1.250E-10 


Coachella 


777.31 


869.37 


786.82 


769.85 


840.33 


2.340E-02 


Sierra 


823.69 


798.66 


900.53 


913.57 


858.9 


3.670E-06 


Skipwith 


2658.12 


2660.53 


2657.44 


2657.44 


2654.12 


8.700E-02 


Tuesday 


2516.69 


2654.99 


2512.69 


2518.53 


2575.28 


1.140E-01 


Ythan 


3380.31 


3343.62 


3357.12 


3357.12 


3394.18 


1.070E-08 



Table 2: AIC values for the probabilistic extensions of the ADBM and the 
other four simpler models described in the text. The AIC accounts for the 
number of parameters as well as the goodness of fit. Akaike weights (A.W.) 
measure the confidence that the ADBM is the best among the examined 
models. 
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Food Web 


loe £ 

RND 


AIC 

RND 


CASC 


AIC 

CASC 


loe £ 

GROUP 


AIC 

GROUP 


i 


Coachella 


A 9 1 11 

-4dl.il 


ooo.zz 


-dO(J.40dO 


/U4.yu / 


1/11 
-141. Od / 


A 1 1 07/I 

411. z /4 


Q 
O 


Benguela 


-449.575 


903.15 


-364.77 


733.54 


-189.4495 


476.899 


7 


Broadstone 


-402.36 


808.72 


-363.1855 


730.371 


-99.9415 


271.883 


6 


Sierra 


-479.06 


962.12 


-388.266 


780.532 


-107.6645 


313.329 


7 


Broom 


-485.1 


974.2 


-480.4365 


964.873 


-279.8005 


657.601 


7 


Skipwith 


-1262.36 


2528.72 


-1097.235 


2198.47 


-559 


1360 


11 


Carpinteria 


-964.745 


1933.49 


-862.555 


1729.11 


-536.475 


1272.95 


10 


Tuesday 


-1444.36 


2892.72 


-1254.93 


2513.86 


-295.6795 


833.359 


11 


Ythan 


-1645.715 


3295.43 


-1444.84 


2893.68 


-828.445 


1898.89 


11 



Table 4: Likelihood and AIC values for all the networks using the three 
probabilistic models described in the main text. The AIC takes into account 
the number of parameters that is 2 for the random digraph (RND), 2+S 
for the cascade model (CASC) and 2+S+^y 2 in the group-based random 
digraph. Because 7 varies among networks, its value is reported as well. 



part or on the diagonal. Having set the number of species and the hierar- 
chy among them, we connect species in the following way: we draw forward 
connections with probability pu and backward connections with probability 
Pl- We define Ljj as the number of links in the upper triangular part of the 
empirical network, Ll as the number of links in the lower part, Ku and Kl 
as the number of links proposed by the model in the upper and lower part 
and Mu and M L as the matched links. It is trivial, given the derivation for 
the random graph, to write the probability mass function for this case: 



V(Mu, M L , Ku, K L \S, Pu , p L , H, N(S, L v , L L )) = 
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Where K v G [0, . . . , for each K v M v e [0, . . .,min(Ku, L v )], 

while K L e[0,..., and M L e [0, . . . , min(K L , L L )\. The total number 

of combinations for the four values of interest therefore can be quite large: 



For example, for the Ythan estuary food web we have L v = 421, L L = 4, 
S = 88 leading to more than 2.989 • 10 10 possible combinations. Although 
the number of combinations is very high, it is still possible to compute the 
univariate distribution for Q in the same exact way as for the random graph 
by condensing the multivariate distribution. 



Also for this model one can easily derive the likelihood by setting K v = 
M v = L v and K L = M L = L L : 




(10) 



Pu u ri L (i-Pu) 




(11) 



And the AIC: 



AIC = 6 + 2S - 2 [Lu log( Pu ) + 



S(S-l) 



Lu log(l 



Pu) 



2 



(12) 
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p-value: a group-based random digraph 

Finally, I derive here the probability of obtaining any Q for a model that is 
a collection of random digraphs in which species interact according to the 



group" they belong to (Allesina and Pascual 2009). For example, if we 



divide the nodes of a network into two groups ("red" and "green"), we will 
use four probabilities for deciding whether to connect a red node to a red 
node (p rr ), a red node to a green node (p rg ), a green to a green (p gg ) and 
a green to a red (p gr )- The number of probabilities required will therefore 
be 7 2 where 7 is the number of groups. This model is simply a collection of 
random subgraphs. We first define a vector G containing, for each species, 
the group the species is assigned to. We further define Ly as the number 
of links in the empirical network connecting resources belonging to the i th 
group to consumers belonging to the j th group, Kij as the number of links 
proposed by the model for the interaction between these groups and M y - the 
matched links. Finally, we write < i > for the size of the i th group. We can 
now write the multivariate pmf for all combinations of and M y -: 



PiK&MijlSifa&N&Ly)) 

r p^n-p,r><^( L A( <l ^ <3: :r L - 



nn 



(13) 



Note that the model is conceptually very simple: in the case 7 = 1 the 
model reduces to the random digraph described above. Although listing all 
the possible cases is theoretically feasible, their number can be immense: 
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7 7 



Num. cases = J f | | 



* 3 



(< i >< j > +l)(Lij + l) 



L>ij(L 



13 



(14) 



For example, for the Coachella Valley food web examined below, I found 
more than 10 43 possible combinations, so that obtaining the exact distribu- 
tion is not computationally feasible. Nevertheless, as for the other cases, the 
likelihood and the AIC are readily derived and easy to compute: 



7 7 



C(S,p ZJ ,G\N(S,L tj )) = nil ['V fl l>:< 



<i><j>-Li 



(15) 



7 7 



AIC = 2 + 2S + 2 7 2 - 2 J2 J2 i L ^ l °9(Pij) + (< * >< 3 > -U 3 )log{\ - Pij )] 

i j 

(16) 
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